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The influence of the patchiness and correlations in the distribution of hydrophobic and polar residues 
at the interface between two rigid biomolecules on their recognition ability is investigated in idealised 
coarse-grained lattice models. A general two-stage approach is utilised where an ensemble of probe 
molecules is designed first and the recognition ability of the probe ensemble is related to the free 
energy of association with both the target molecule and a different rival molecule in a second step. 
The influence of correlation effects are investigated using numerical Monte Carlo techniques and 
mean field methods. Correlations lead to different optimum characteristic lengths of the hydrophobic 
and polar patches for the mutual design of the two biomolecules on the one hand and their recognition 
ability in the presence of other molecules on the other hand. 
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I. INTRODUCTION 

An understanding of the basic principles of biomolecular recognition, that is the ability of a biomolecule to interact 
selectively with another molecule in the presence of structurally similar rival molecules, is not only important from a 
scientific point of view but also opens up a wide field of potential biotechnological applications [3, 0, 0| . The recognition 
process itself is governed by a complex interplay of non-covalent interactions such as salt bridges, hydrogen bonds, 
van der Waals and hydrophobic interactions. The typical intrinsic energy contribution of such an interaction is of 
the order of 1-2 kcal/mol and is thus only slightly larger than the thermal energy k^,T TOOm = 0.62 kcal/mol at room 
temperature 0, 0]. In order to stabilise a complex of two proteins over a time long enough to ensure its biological 
function, many favourable interactions have to be established to overcome the entropic cost of the formation of the 
complex. Therefore, the two molecules have to complement each other at the common interface with respect to shape 
and interaction partners 0. This principle of complementarity is closely related to the lock-and-key view of rigid 
protein-protein recognition [3]. 

Molecular recognition results from an interplay of numerous competing and cooperating factors. Apart from the 
scenario of recognition between rigid proteins, recognition processes where at least one of the biomolecules undergoes 
conformational changes are also numerous in nature. Such recognition processes are described by the induced fit 
scheme 0. To understand the recognition process in full, one not only needs to consider the stability of a single 
specific complex, but also the encounter of the two biomolecules in the heterogeneous environment of the cell. For 
example, long-range electrostatic interactions are believed to pre-orient the biomolecules so that the probability of 
an encounter of the complementary patches on the two molecules upon collision is increased 0, [§] . Another critical 
aspect is the competition due to the simultaneous presence of different molecules. The more the binding free energy 
between complementary biomolecules differ from the binding free energy to other molecules the lower is the risk of 
misrecognition. 

The recognition problem of two biomolecules shows up in different disguises in nature. To gain insight into this 
problem different approaches can be adopted. A detailed modelling (often on an atomistic level) of the biomolecules 
that form a complex gives many insights into the actual binding process between two specific biomolecules. In drug 
design docking methods allow the identification of the drug molecule with the optimum binding affinity for a known 
biomolecule. A second way to investigate the problem of molecular recognition is the use of coarse-grained models. The 
study of idealised coarse-grained and hence abstract generic models with methods from statistical physics seems to be 
particularly adequate for an understanding of the basic common physical mechanisms that govern different recognition 
processes in the heterogeneous environment of a cell. The coarse-graining approach is based on a reduction to the 
most relevant degrees of freedom for molecular recognition which helps to abstract from complications due to the 
intricate interplay of the involved types of interactions so that the generic features nature exploits for recognition can 
be identified [l(| ■ This approach has been adopted in the literature to analyse various aspects of biomolecular binding 
and recognition for (almost) rigid and flexible biomolecules in idealised model systems [1 lL ll2L ll3L ll-4L 1 15L ll6L ll 7L 1 18L 1 1SI| - 

On popular approach to study the basic principles of molecular recognition consists in investigating the adsorption 
of heteropolymers on patterned surfaces. Biomolecular recognition is then viewed in a first approximation as the 
adsorption of a biopolymer on the surface of another biopolymer. One major aspect addressed in this context deals 
with the question, whether or not length scale matching on the two polymers favours adsorption [13, E3, H3, 0, HH 
I25I [iBl • Generally speaking it was found that the adsorption properties depend on the involved types of correlations 
and that statistically structured surfaces (be it correlated or anticorrelated ones) have an enhanced affinity towards 
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similarly structured chains although an exact matching of the corresponding correlation lengths is not necessary. The 
adsorption is followed by a second freezing transition where the flexible chain adjusts to the pattern of the surface 
which necessitates a more precise matching of the correlation lengths. Bogner et al. also addressed the role of 
correlations and found that biomolecular binding seems to be strongly influenced by small scale structures suggesting 
that local structure elements are particularly important for molecular recognition. 

The present study is in some sense complementary to those works. We investigate the influence of correlation 
effects on molecular recognition within coarse-grained models that are specifically designed to model the recognition 
between almost rigid proteins. In particular we focus on the role of the presence of competing rival molecules on the 
recognition characteristics. In our model correlations appear in the distribution of hydrophobic and polar residues on 
the surface of a biomolecule. These correlations result in extended patches of several hydrophobic and polar residues 
on the surface of the protein. The patterns of the actual target molecule and the rival molecules thereby exhibit the 
same characteristic correlation lengths. We then address the question about the optimum correlation length of the 
biomolecule that is supposed to recognise the target. All in all our analysis shows that a matching of the patterns on 
the surfaces is necessary to a certain degree in order to get optimum selectivity. However, the precise way how the 
correlation lengths fit to each other depends on whether or not rival molecules are present, that is whether the isolated 
binding process or whether the actual recognition process with rival molecules present is considered. We note also 
that in a recent study the effect of correlations that stem from the density of atoms on the surface of a biomolecule 
was considered in the context of connected proteins in protein interaction networks [13] ■ 

The present article is organised in the following way. In the next section our general approach to biomolecular 
recognition of two rigid proteins in the presence of rival molecules is briefly sketched (for a more detailed account, 
see [H, Hil). In the subsequent section [Jul we discuss how correlations in the distribution of hydrophobic and polar 
residues can be incorporated into the model. In sections [IV] and [V] we then investigate the influence of sequence 
correlations on molecular recognition by using Monte Carlo techniques and mean field approximations. 



In this work we use coarse-grained idealised model systems to investigate the recognition of two biomolecules. 
Coarse-grained model systems contain a limited number of degrees of freedom and hence the recognition problem 
in its various disguises can not be captured in its full scope. We limit our investigations to recognition processes 
that belong to the scenario of rigid protein-protein recognition and consider only the stabilisation of the complex. 
Dynamical aspects concerning the encounter of the two proteins in the cell and the formation of the complex are 
not incorporated. The generic model we use is built on observations of (universal) features of rigid protein-protein 
recognition so that the physics which different recognition processes have in common is captured in the model. 

We apply a coarse-grained point of view on the level of both the sequence of the amino acids on the so-called 
recognition sites of biomolecules at the mutual interface and the residue-residue interactions stabilising the complex. 
The backbones of the proteins are assumed to undergo no refolding during the association process. This is a justified 
assumption for most protein-protein recognition processes, although notable exceptions do exist B , Q | , [30|. Motivated 
by the observation that hydrophobicity is the major driving force in molecular recognition [2, [3. l30l . [311] we describe the 
type of the residue at the position i = 1, . . . , N of the recognition site by a binary variable [2a,[23| where one of the two 
values represents a hydrophobic residue and the other one a polar residue. Note, that an eigenvalue decomposition of 
the Miyazawa- Jernigan matrix leads to an approximate parameterisation of residue-residue interactions by an Ising-like 
energy term with discrete variables that can take on two distinct values [32]. This gives additional justification to the 
use of HP-models for the residue-residue interactions. Denoting the type of the residue at position i of the recognition 
site of one of the two molecules by cr, € {+1 (hydrophobic), —1 (polar)} the residue sequence on the recognition site 
with N residues is then specified by a = (<ri, . . . , ctn)- Similarly the type of residue at position i of the recognition 
site of the interaction partner is specified by 6 = (9i, . . . , On) with 9i E {±1}- 

We then model the energetics at the two-dimensional contact interface of the two biomolecules by 



where the energy contributions of the contact between two residues across the interface are summed up. The variable 
Sj takes on the two discrete values ±1 and describes the fit of the shape of the molecules at position i of the interface, 
for a poor fit, i. e. Si = — 1, we assume no contribution to the stabilising energy. The variable S models the influence 
of a (local) rearrangement of the amino acid side chains on a microscopic level when the complex is formed [2I I9L l3(i| . 
Note that such rearrangements are observed even if the tertiary structures of the proteins remain unaltered upon 
complex formation. Apart from the direct contact energy with strength e the model Hamiltonian {TJ contains an 
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additional cooperative interaction term where the quality of a residue-residue contact couples to the structure in 
its neighbourhood. This term has the effect that a locally good fit at some position in the interface influences its 
neighbourhood [29]. 

In our idealised view of the interface each biomolecule contributes with the same number N of coarse-grained 
"residues". This assumption is questionable for real interfaces, particularly for curved interfaces different numbers of 
amino acids appear [§] . In Hamiltonian ((J) a residue of one of the biomolecules interacts precisely with one residue 
on the other molecule. This simplified assumption is also not valid for real residues, in particular as different amino 
acids are of different sizes so that a large residue can interact with several smaller amino acids. However, one can 
think of a general partition of the interface in N contact patches of the same size on each of the biomolecules where 
larger amino acids contribute to several patches whereas small ones only to a few. A value of the hydrophobicity 
can then be attributed to each of the patches on the biomolecules. Within such a description the (free) energies can 
be approximated by the model ([1]). For the sake of simplicity, however, we stick to the expression "residue" in the 
following discussions. We also note that solvation effects at the recognition sites and the associated entropy changes 
are crucial when the complex of two biomolecules is formed [13, S|- In the adopted coarse-grained approach, however, 
it is assumed that all these contributions are of comparable size for all proteins under consideration. Notice also that 
by reducing the interactions to the hydrophobic effect solvation effects are already partially included in HP-like models 
(on a formal level due to integrating out the solvent degrees of freedom resulting in effective interaction constants like 
e in UJ). 

To study the recognition process between two rigid proteins we adopt a two-stage approach. For a fixed target 
sequence we first design an ensemble of probe molecules 9 at a design temperature 1//?d in such a way that the 
sequence 6 should optimise the interface energy. This design by equilibration leads to the distribution P{9\a^>) = 
-g- J^s CX P ( — /^d^ctW t 0;S)). This first step should mimic evolutionary processes or the design of artificial molecules 
in biotechnological applications. The quality of the design can be quantified by evaluating the average (K) p(0| CT (t)) 

of the overlap K = "i °f the sequence of the probe molecules with the previously fixed target sequence. A 

large (K) p^uctj) then signals a high complementarity of the two recognition sites in regard to the actual recognition 
process of the two proteins. Notice that (K) P fgi c (t)) is generally dependent on the particular chosen target sequence 

In a second step the free energy difference of association at temperature is calculated for the interaction of the 
probe ensemble with the target molecule crW on the one hand and a structurally different rival molecule on the 
other hand. In this step the free energy of the interaction 

F(0\*M) = ~ In exp(-/W(^), 9; S)^j (2) 

of the molecule {t (target), r (rival)}, with a particular probe sequence 9 has to be averaged with respect 

to the distribution P(9\a^) giving = (F(9\a^)) p ^ e ^ {t) y This leads finally to the free energy difference 

Af (crW, ct^) = — F^ . In order to value the recognition ability of the system the free energy difference AF is 
then averaged over all possible target and rival sequences on their respective recognition sites: 

[A^Uvcr, = E W^(a^)W^(a^)AF (3) 

CT (t),cr(r) 

where the denote the distributions of the sequence of the target and rival molecules, respectively. A negative 

[AF] ff( t) ^(r) then signals an overall preferential interaction of the probe molecule with the target leading to the desired 
selectivity of the recognition process. In the following discussions square brackets indicate an average over all possible 
target and rival sequences whereas pointed brackets denote an average over the designed ensemble of probe molecules. 

Our approach can be roughly illustrated by the technologically relevant case of developing a drug molecule with a 
high affinity to a particular protein. The target molecule of our terminology corresponds to a known protein which 
is responsible for a disease, for example, with a well-located recognition site. Our design step then corresponds to 
finding the most suitable drug molecule called probe in our nomenclature. The subsequent testing step then models 
the administration of the drug to an organism where additional proteins (rival molecules) are present apart form 
the known protein the drug molecule is supposed to bind to, so that all theses proteins can compete for the drug 
molecules. 
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III. INCORPORATING SEQUENCE CORRELATIONS 

In Hamiltonian (Q]) only the energetics of the contact interactions of residues across the interface between the 
two interacting molecules is taken into account. However, the residues that constitute the recognition sites on the 
proteins also interact with each other, so that different sequences result in different contributions to the total energy. 
Non-covalent hydrophobic-polar contacts between neighbouring residues in the recognition sites, for example, lead to 
unfavourable energy contributions. As a consequence patches of several hydrophobic or polar residues are likely to 
show up. Thus the probability of having a certain type of residue at position i, say, in the recognition site depends 
on the type of the residues in the neighbourhood of i, so that the sequences are correlated. Indeed the appearance of 
patches of residues of a similar hydrophobicity can be observed in the majority of protein-protein interfaces [35| . 

On a formal level, correlations can be incorporated by introducing, apart from the contact energy Hi nt at the 
interface, an additional correlation term TC cor to the Hamiltonian. Note that in principle correlation energies also 
show up in the interior of the proteins and in turn induce correlations on the surface of the molecules. In this work, 
however, we are only concerned with the interaction between two proteins which depends on the nature of the residues 
that constitute the recognition sites. We thus do not consider these further distributions of interior (or other surface) 
residues explicitly. 

Focusing on the sequence 9 of the probe molecules for the discussion we consider the following correlation energy: 

(i,3) 

The first sum extends over all neighbouring residues in contact and hence represents the interactions due to hydropho- 
bicity so that the associated parameter j p thus controls the corresponding (nearest-neighbour) correlations. These 
correlation interactions lead to the formation of extended patches of either hydrophobic or polar residues in the recog- 
nition sites. The characteristic extensions of these patches can be interpreted as a measure of the correlation length 
A p . In the second contribution the hydrophobicity of the recognition site couples to the parameter ^ p which therefore 
controls the overall number of hydrophobic residues. The design step then gives the probability of a certain probe 
sequence 9 for a given target sequence o-w. This probability distribution for the probe ensemble is then generally 
given by 

P(%«) = i exp(-/3 D H int - H cor ) (5) 

where M denotes the normalisation. In general this probability depends on the particular sequence o-w of the 
recognition site of the given target. Note that the contributions from the correlation energy are considered not to be 
subjected to thermal fluctuations as only the rearrangement variable S is assumed to equilibrate. 

After the average over the probe ensemble has been carried out the free energy difference AF(a^', crW) for a 
given target-rival pair depends on the parameters 7 P and fi p . For the final average over the possible target and rival 
molecules sequences with particular correlation properties are considered. Formally the corresponding probability 
distributions for a £ {t (target), r (rival)} are given by 

W< Q V Q) ) ~ cxp(-W cor (<7( Q >)) (6) 
with associated parameters j a for the (nearest-neighbour) correlations and fi a for the overall hydrophobicity 

(7) 

For the investigation of the influence of sequence correlations on molecular recognition in our model we adopted 
the following strategy. For a fixed pair of target and rival sequences the probe ensemble will be generated for the 
parameters 7 P and /i p which in turn determine the correlation length A p . Note that the generated probe molecules 
are not perfect with respect to the target molecule due to evolutionary processes leading to defects. Then the 
recognition ability is assessed by evaluating the free energy difference AF(a^\ crW) for the given target-rival pair. 
This free energy difference is then averaged over all possible target-rival pairs, where similarly to the probe molecule 
the associated parameters ~f a and [i a determine the correlation lengths X a . By this approach the overall recognition 
ability [AF^tj am (A t ,A r , A p ) is hence computed as a function of the correlation lengths (and hydrophobicities) of 
the target and rival molecules and of the predesigned probe molecules. For given correlation lengths A t and A r of the 
target and rival molecules, respectively, the correlation length A p of the probe molecules is then varied to find the 
optimum recognition ability. 
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IV. UNCOOPERATIVE MODEL 



The interaction energy JT]) at the interface between the two proteins comprises apart form the direct contact 
contributions due to hydrophobicity additional cooperative terms where the rearrangements of neighbouring amino 
acid side chains couple to each other. In this section we set the corresponding interaction constant J to zero and 
consider only the direct hydrophobic energy contributions. The total Hamiltonian for the interface energy between a 
molecule with the sequence a and the probe molecule 9 thus reads 

1 N 1 4- 9 

H(a, e- s) = n int + -n C0IT = -e E —5-^ -jYl Mi "if E ( 8 ) 

i=l i 

As the interaction variable Si at position i does not couple to the variables at other positions j : ^= i of the interface 
the corresponding thermal average can be carried out resulting in an effective Hamiltonian that depends only on the 
sequence variables any more. Including the contributions from the correlation energies it is given by 

H eS (a, 9) = - 1 Y, ^ ~ E Wi -T^ 9l + COnSt ' (9) 

Here we have used the fact that cosh((3e<Ji9i) = cosh(/3e) for all choices of Oi and 9i. The constant in ^ is temperature 
dependent, however, as we are only concerned with the effect of correlations on the molecular recognition ability, we 
fix the temperature and thus can omit the constant. The free energy for the interaction between the sequences 
a and 9 is F(6\a) = — ^J^i^i^i + ^^cor(^) and can now be averaged over the possible probe sequences that are 
distributed according to the probability P{Q\a^) ~ exp(— f3TL fi s{^ t \ 9))- Note that the design might be carried out 
at a temperature /?d which is different from the temperature /3 at which the selectivity is determined. However, we 
are not interested in the effect of a temperature variation in this work and therefore choose /3q = /?. The correlation 
energy H COI does not explicitly depend on the sequence and hence when computing the free energy difference 
between the interaction of the target molecule with the probe ensemble on the one hand and the interaction of the 
rival molecule with the probe ensemble on the other hand these correlation contributions cancel and one ends up with 

Ai?(a<V«) = -f 5>« Ct) " ^) <fc>P(*k«)) ■ ( 10 ) 

i 

The free energy difference is hence determined by the difference of the complementarity of the probe ensemble with 
the target sequence on the one hand and the complementarity of the probe ensemble with the rival sequence on the 
other hand. Note also that the free energy difference exhibits a dependence on the correlation parameters 7 P and ^ p 
(which enter the distribution P and hence influence the average hydrophobicity at position i of the recognition site 
of the probe molecule) and thus on the correlation length A p . 

To assess the overall recognition ability the free energy difference (fTO)) has to be averaged over all target and rival 
sequences which are distributed with respect to ([6]) with correlation Hamiltonians of the form (j4]) . As the target and 
the rival sequences are independent of each other, the averaged free energy difference is therefore given by 

t A ^] = "|E R MpW*to)] W to + f ^pWctA (11) 

i 

= ~\ [<*>p(flktt))] w(t) + f^pW<tA (12) 

in terms of the complementarity of the probe ensemble and hydrophobicities h p and h x of the probe and rival molecule, 
respectively. The second term originates from the interaction of the probe molecules with the rival molecule. It is only 
determined by the respective hydrophobicities of the molecules and is independent of the structure elements related 
to the hydrophobic and polar patches of the recognition sites. Note that the hydrophobicity h p hinges on the sequence 
of the target molecule. The first term stems from interactions of the probe molecule with the target molecule. This 
term depends sensitively on an appropriate matching of the structure elements on the recognition sites and is hence 
directly influenced by correlation effects in the corresponding distributions of the hydrophobicity. 

In the following subsections we use two methods to carry out the remaining averages in l[T2jl . namely numerical 
Monte Carlo techniques and a mean field approximation. Larsen et al. reported that basically two types of interfaces 
appear in protein-protein complexes [35]. In the minority of complexes the interface has a hydrophobic core which 
consists of a single large patch and which is surrounded by a rim of polar interactions with residual accessibility by 
solvent molecules. For the majority of complexes, however, the interface is made up by a mixture of small hydrophobic 
patches and polar interactions. We thus focus in the following discussions only on the situation where the correlation 
lengths of the target and rival molecule, respectively, are relatively small compared to the extension of the interface. 



A. Numerical results 

The remaining averages in expression lfT2|) of the free energy difference — first over the probe ensemble with 
the distribution P{0\a^) and then over the target sequences with the distribution — can be carried out 

numerically by means of Monte Carlo methods. For a given target and rival sequence the quantities of interest 
(averaged complementarity and free energy difference as a measure for selectivity) are computed first. Then the 
final average over the target sequences with fixed parameters 7 t and fi t (and hence fixed correlation length A t and 
hydrophobicity h t ) is evaluated. As we are interested in the recognition ability of the system if the rival molecule is 
structurally very similar to the target molecule, the same correlation parameters are used for the average over the 
rival sequences and thus one has in particular h T = ht- 

The probe molecules are designed for different correlation parameters 7 p . The probe sequence is optimised with 
respect to the target sequence, thus we do not further restrict the hydrophobicity and therefore set fi p = 0. The 
correlation parameter 7 P can therefore be directly converted into the correlation length A p . The (pseudo-) correlation 
length for recognition sites of a finite extension is computed to be the average size of clusters that are made up of 
neighbouring residues of the same type. In the following figures the shown correlation length A p is normalised in such 
a way that its maximum possible value is one for a system where the whole recognition site is made up of precisely 
one cluster with either hydrophobic or polar residues. 

Alternatively the correlation l engt h of a finite system can be defined by the second moment of an (appropriately 
normalised) correlation function [3(|. However, both definitions lead to the same qualitative behaviour of the corre- 
lation length as a function of the varying correlation parameters. The correlation length increases monotonically as 
a function of the correlation parameter 7 P and saturates for sufficiently large values. Note also that in [13] the corre- 
lations on a finite surface where measured by a so-called patchiness which was defined to be basically the (suitably 
normalised) expectation value of the correlation energy Y^iij) m terms of our notation and convention. 

For simplicity the systems considered for the Monte Carlo simulations are of regular rectangular geometry and 
contain between 64 and 256 spin variables. Note that real recognition sites contain typically 30-40 residues, however, 
up to minor finite-size effects we find the same qualitative behaviour for systems of different sizes. As indicated in 
the introduction the energy contribution e of a non-covalent bond is only slightly stronger than the thermal energy 
at physiological conditions. We therefore typically choose j3e > 0(1). In the following results we discuss the system 
with /3e = 1 if not stated otherwise. 
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FIG. 1: Average complementarity of the probe ensemble with j3e = 1 as a function of the correlation length for targets with 
different hydrophobicities (solid lines, from top to bottom, ht — 0.5, 0.4, 0.3, 0.2 and 0.1, the curve for ht = 0.0 is not shown 
as it is hardly distinguishable from the one with ht =0.1 in the displayed range of A p ). The correlation length of the targets is 
fixed to the value indicated by the black circle (At = 0.263, corresponding e. g. to 7t = 0.4 for ht = 0.0). The optimum of the 
complementarity is slightly shifted to larger correlation lengths on the probe molecule. For the dashed curves f3e = 1.5 and 2.0 
(from the bottom up), again with ht = 0.5. 

Consider a system with targets and rivals whose correlation length is relatively small so that the recognition sites 
consist of a relatively large number of rather small hydrophobic and polar patches. We investigated systems with 
hydrophobicities ranging from h t / r = 0.0 to h t / T = 0.5 and correlation lengths between A t / r = 0.2 and A t / r = 0.35 
(note that the uncorrelated system with j t / T = 0.0 corresponds to a correlation length larger than the minimum 
length A t / r = 1/L for a system with linear extension L due to finite size effects) . For all the systems we find the same 
qualitative behaviour, we therefore discuss exemplarily the system with L = 16 and X t / T = 0.263 in the following. 




7 



In figure[Tjthe average complementarity of the designed probe molecules is shown as a function of varying correlation 
length A p of the recognition site of the probe molecules for different hydrophobicities of the target molecules. It has 
to be noted first, that the complementarity (as well as the selectivity, which is discussed below) is first enhanced by 
increasing correlations, reaches an optimum and finally decreases again. The probe molecules are expected to have a 
maximum complementarity if the patches of hydrophobic and polar residues on the target are matched by correspond- 
ing patches on the probe. However, the optimisation of the probe ensemble is carried out at a finite temperature 
and therefore thermal fluctuations limit the complementarity due to defects in the distribution of the interaction 
partner as the patches fray out at their boundaries. The position of the maximum of the average complementarity, 
that corresponds to the optimum choice of the correlation length of the probe molecules, is shifted to slightly larger 
values compared to the fixed correlation length of the target molecule. This signals the fact that a slightly larger 
correlation length compensates the appearance of defects in the boundaries of the patches during the design step and 
thus increases the complementarity. This effect is less pronounced if the temperature is decreased as defects appear 
more seldom. Notice also that the average complementarity tends to the fixed hydrophobicity h t of the target in the 
limit A p — ► 1 as in this case the recognition site of the probe is made up of hydrophobic residues only (compare figure 
El). 
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FIG. 2: The complementarity [{K)\ /N and the free energy difference [AF] /N as a function of the correlation length of the probe 
molecules. The correlation lengths of the target and rival molecules are fixed to the value shown by the circle (At = A r = 0.263), 
the corresponding hydrophobicities are ht = h x = 0.5 (solid line) and 0.4 (dashed line). Compared to the optimum for the 
design of the probe molecules, the optimum of the recognition ability is clearly shifted to smaller values of the correlation length 
on the probe molecule (optima indicated by arrows for ht = 0.5). Additionally, the complementarity of the probe ensemble 
with respect to the rival molecules is shown for ht = 0.4 (dotted line). Notice that the system for the shown data has a linear 
extension L = 16 and hence the minimum possible correlation length is A p « 0.06, the uncorrelated system with 7p = has 
A p « 0.16. 

For the uncooperative model (U) of the direct contact energy at the interface between the biomolecules the free energy 
difference is determined by the difference in the complementarity of the probe ensemble with respect to the target 
molecules and the rival molecules, respectively (compare relation ((ID)) ). In figure [2] (upper part) the complementarity 
with the rival molecules is shown in comparison with the one with respect to the target as a function of the correlation 
length A p . The probe ensemble is always more complementary to the target, with respect to which it has been 
optimised during the design step. For an increasing correlation length on the probe molecule the complementarity 
with respect to the rival sequence is increased until it finally reaches the maximum possible value for A p — > 1. In 
this case the probe is not structured any more and hence cannot discriminate between different sequences any more. 
In figure [3] the distribution D(K) of the complementarity parameter with respect to the target and with respect to 
the rival molecules (averaged over all target and rival sequences) are compared for two different correlation lengths. 
For probe molecules with small structure elements with a characteristic length in the proximity of the optimum value 
the two distributions are well separated and hence the probe can discriminate the two molecules. For increasing 
correlation length and hence diminishing structuring of the probe molecules the two distributions approach each other 
and therefore selectivity is decreased. This comes along with a broadening of the distributions when going away from 
correlation lengths that correspond to the optimum conditions for the selectivity. For A p — > 1 to two distributions 
become eventually identical. Similarly, the two distributions are converging towards each other when the correlation 
length is decreased to the minimum possible value. 

Figure [2] shows the free energy difference of the interaction of the probe molecules in a system with target and rival 
molecules, again as a function of the correlation length of the probe molecules. Note that the hydrophobicity h p in lfT2|) 
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FIG. 3: Distribution of the complementarity of the probe ensemble with respect to the target molecules (solid line) and the 
rival molecules (shaded curve) for different correlation lengths on the recognition site of the probe molecules. On the left hand 
side the correlation length A p = 0.25, on the right hand side A p = 0.75. The hydrophobicities of the target and rival molecules 
are ht = h r = 0.4, the correlation lengths are At = A r = 0.263 in each case. 



exhibits a dependence on A p . For A p — ► 1 the free energy difference has to vanish as the probe molecule consists only 
of amino acids of the same class in this case and hence it can not distinguish on average between different sequences 
any more. The minimum of the free energy difference corresponds to a system with optimum recognition ability. The 
numerical results show that for recognition sites of the target with an excess of hydrophobic residues the optimum of 
the recognition ability is clearly shifted to smaller values of the correlation length compared to the appearance of the 
optimum in the design of the probe molecules. The reason for this shift lies in the fact that the structure elements of 
the recognition sites influence the contributions of the target-probe interactions to the free energy difference whereas 
the rival-probe interactions do not feel these structure elements. A smaller correlation length implies the appearance 
of an increased number of smaller patches on the recognition site of the probe molecule and hence an entropic benefit 
for the interaction with the target due to more possible ways to align each other favourably. This effect does not 
contribute to the free energy for the rival-probe interactions as it is insensitive to a matching of structure elements 
(compare relation (jT2j) and the discussion there) . The emergence of the shift of the optimum correlation length also 
means that the design of the probe molecules has not to be carried out as effectively as one might expect naively. 
Therefore the system is at liberty to carry out the design not at the possible optimum way without losing the optimum 
recognition ability. 

Interestingly this shift of the optimum correlation length depends on the value of the hydrophobicity of the target 
and rival molecule. Figure H] shows that the shift vanishes for recognition sites with the same number of hydrophobic 
and polar residues (as is clear form relation {12])) and increases with increasing hydrophobicity. Note that in nature 
recognition sites with different hydrophobicities show up for proteins with different biological function. In enzyme- 
inhibitor complexes one typically finds largely hydrophobic interfaces whereas the hydrophobicity in antibody-antigen 
interfaces is significantly lowered [jj [3(3] ■ 

Although the recognition sites in real systems show always extended patches of either hydrophobic or polar amino 
acids [35[ we briefly discuss systems where no nearest neighbour correlations appear in the distribution of the residues 
on the target and rival molecule. As a consequence the recognition site is rather diffuse on average concerning 
the distribution of hydrophobic and polar residues. The hydrophobicity of the corresponding recognition sites is 
nevertheless fixed to a certain value and the correlation length due to nearest neighbour correlations is varied on the 
recognition site of the probe molecules to find the optimum selectivity. The results for different hydrophobicities are 
depicted in figure [H The correlation parameter at which the optimum complementarity of the probe molecules with 
respect to the target molecules shows up depends on the hydrophobicity of the target and is shifted to values larger 
than zero for positive hydrophobicities. In this case the probe molecules prefer a correlated, i.e. patch-structured 
surface although the target surface is uncorrelated and thus unstructured. The free energy, on the other hand, has 
always its optimum for uncorrelated probe molecules. So again the design need not be carried out in the optimal way, 
but correlations will not enhance selectivity as in the case of correlated targets and rivals. 

Finally we compare our results to the findings of the work by Lukatsky and Shakhnovich who investigated the influ- 
ence of correlated density distributions at the interface between biomolecules [27j |. From their study they deduced that 
the presence of correlations is a basic principle for recognition between proteins and lead to an enhanced probability 
to find such interfaces as hub-hub interactions in protein-protein networks. In our work we consider correlations in 
the distribution of hydrophobic and polar residues within the surface of the biomolecules. We basically reach the 
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PIG. 4: The shift of the optimum value of the correlation length for the recognition ability compared to the optimum value for 
the complementarity as a function of the hydrophobicity of the target (note that ht = h T ). Instead of error bars some of the 
results from the Monte Carlo runs (open circles) are shown together with the results of the analysis of the data (full circles). 
The dashed curve is a quadratic fit to the data (see discussion in section HVB[) . 
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FIG. 5: The complementarity [{K)\ /N and the free energy difference [AF] /N as a function of the correlation parameter 7 P and 
of the correlation length A p , respectively, of the probe molecules. The correlation parameters of the target and rival molecules 
are set to zero, the corresponding hydrophobicities are fixed to the values h T = ht = 0.5 (solid curve), 0.25 (dashed line) and 0.0 
(dotted line). The free energy difference has an optimum for the correlation parameter 7 P = 0.0, the optimum complementarity, 
however, is shifted to larger values. 



same conclusions as Lukatsky and Shakhnovich. The corresponding correlations lead to lower binding energies for 
moderately correlated interfaces as is indicated by the increase of the averaged complementarity as shown in figures 
Q] and [2j This points to a universal importance of (different) correlation effects to ensure the necessary specificity of 
recognition processes. Our approach contains an additional design step where the two recognising proteins are opti- 
mised with respect to each other. Note that the expression "design" has been used in [22J to refer to the emergence 
of correlations. 



B. Mean field approximation 

The averages in expression (fl2|) of the free energy difference can not be evaluated analytically, however, progress can 
be made by applying a mean field approximation. Introducing the variable ki = ^S- + \of^ the effective Hamiltonian 
that describes the distribution of the sequence of the probe molecules after the design step has been carried out is 
given by 



P (id) 



(13) 
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dropping an irrelevant temperature-dependent constant. The variable ki can be interpreted as a random variable 
whose probability is determined by the distribution W^' of the target sequence. The system can therefore be viewed 
as a random field Ising model. The mean field treatment in the form of the equivalent neighbour approximation 
amounts to replacing 7i e ff by 



H 



(MF) 
eff 



M) 



7 P 
2N(3 



(i>) 2 -i> 



The expectation value (0i)p(0| CT (t)) in (fTTjl is then given by the derivative 

1 d 



»/P(0|o-Ct>) 



Ndk 



Gett 



where the effective free energy G e g is related to the Hamiltonian T~i^ F ^ by 

1 



Geff 



P 



InZ, 



eff 



(14) 



(15) 



(16) 



with Z e ff = J2e ex P( — /^ e ff F ')- The effective partition function Z e ff can be calculated in the large N limit by first 
using the identity 



/ a 



Na 2 
exp I —y + axy 



(17) 



(with a > 0) so that the variable x := J2i @i appearing quadratically in the Boltzmann factor of Z e g is linearised 
and hence the summation over 9 can by carried out. The price to pay for this linearisation is the introduction of the 
auxiliary variable y. Omitting irrelevant prefactors the effective partition function is then given by 



^eff 



dyexp (NA(y,k)) 



(18) 



with the argument 



A{y,k) = -^-y 2 + ^^\n cosh ( lp y + f3h) 



(19) 



where k denotes the configuration (fci, . . . , fcjv)- The Laplace method allows an asymptotic evaluation of l|18p in the 
large N limit leading to 



Geff = NA(y , k) = -N^-yl + J2 lncosh (7p2/o + Ph) 



(20) 



with the so-called mean field yo determined by the saddle point equation 



yo 



— ^2 tanh (j p y + fih 



(21) 



Note that the mean field depends explicitly on the sequence of the recognition site of the target. Having computed 
an expression for the effective free energy G e s one can now calculate the desired average 



1 d 



P(8\ 



r(t)1 



iV dfc~ Geff = tanh I 7p2/ ° + Mp + ~° 



fa (t) 



(22) 



Additionally one has J^i p(e\<j<t)) = ^yo so that the mean field gives the expectation value of the hydrophobicity 
of the probe ensemble. The free energy difference fT2j) is then generally given by 



AF = — 
2 



^aftanh 



/fe (t) 



IpVo + Hp + —v 



vy(t) 



^ Nh * [yo] W (t) 



(23) 
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where averages over the target and the rival sequences still have to be carried out. 

Starting from expression lf23|) these averages can be carried out numerically. The mean field yo, that is determined 
by the saddle point equation (|2"Tjl . explicitly depends on the target sequence a^' and hence one has of the order 
of e N saddle point equations for a system with N residues. A particular configuration however, contains 
hydrophobic residues and Y,(' polar ones. For such a configuration the saddle point equation is given implicitly by 
the equation 



yo 



(£<+>,£<->) = ^ tanh ( lp y + Mp + y \ + ^- tanh ( lp y + Mp - ^\ . (24) 



and hence the mean field depends only on the numbers (£( + ),£(~)) for a given configuration. This observation 
drastically reduces the number of saddle point equations. The remaining equations can be solved using computer 
algebra programmes, the average with respect to the distribution W^' can be carried out afterwards. A distribution 
of the form |(6|) can be expressed in terms of the density of states fl(Y,( + > ,T,(~\ E) specifying the number of 
target configurations that are compatible with the macroscopic parameters and E, where E denotes the 

correlation energy. For fairly small systems this density of states can be calculated exactly by suitable enumeration 
algorithms jH], for large systems effective Monte Carlos techniques can be applied (39L lioL l4lj . 

The mean field treatment reproduces the qualitative results of the numerical investigations discussed in subsection 
IIV Al For instance, the complementarity of the probe ensemble and the free energy difference as a measure of the 
recognition ability of the probe-target system in the presence of a rival molecule can now be worked out as a function of 
the correlation parameter 7 P . Again a characteristic shift of the optimum correlation parameter and hence correlation 
length for the two quantities can be observed in accordance with the above discussed numerical Monte Carlo findings. 

The mean field result can be used to consider the case of a small correlation parameter 7 P (with /i p = 0) in more 
details. The implicit saddle point equation ([24]) can be expanded into a power series in 7 p and solved up to oder 7 p . 
This gives 

y a = h t A + j p h t B + 7 2 Ci (25) 

with the numerical constants being A = tanh(/3e/2), B = tanh(/fe/2)sech 2 (/3e/2) and d = B(h t - hi sinh 2 (/?e/2)). 
Note that yo still depends on the particular sequence of the target through the dependency on the hydrophobicity 
h t = /i t (<jW) = l/NJ2i (J i t ^ = (2£( + ) — N)/N. Using ([25"|) the complementarity of the probe ensemble averaged over 
all possible target sequences can be computed up to order 7 p giving 

W WCT (t))l w(t) = A + lp [h 2 ]B + 1 2 p [h 2 ]C 2 (26) 



N 



with C2 = B(l — sinh 2 (/3e/2)). The complementarity is determined in this limit by the second moment of the 
hydrophobicity distribution of the target molecules and hence directly feels the structure of the hydrophobic and 
polar patches on the recognition site of the target. For sufficiently large f3e this expression has a maximum at a 
correlation parameter jk = —B/{2Ci). Note that the position of the maximum is independent of the properties of 
the distribution of the target sequences in the considered situation of a small correlation parameter for the probe 
molecules, in particular it is independent of the chosen hydrophobicity of the target molecules. The numerical Monte 
Carlo data shown in figure Q] seem to be in accordance with this observation — the data is shown as a function of the 
correlation length, the maximum shows up at a fairly small correlation length and hence a small correlation parameter. 
The position where the maximum appears is shifted to smaller values of the correlation parameter and thus correlation 
length for increased /?£. This is again confirmed by the numerical data in figure [TJ Similarly the free energy difference 
can be work out as a second order Taylor polynomial in 7 p . It shows a minimum at a correlation parameter jp- The 
shift A7 P = 7a- — 7f can be expressed in terms of the moments of the distribution of the hydrophobic residues on the 
recognition sites of the target and the rival molecules, respectively: 

B([ht\ B 
A7p ~ 2(C 2 [h 2 t ]-C 1 [h I ]) + 2C 2 (2 ° 

Note that G\ depends on [hi]. For the special case where the two types of molecules exhibit the same distribution 
one has [ht] = [ht] = [h]. The shift is then dominated by A7 P ~ [h] 2 in the asymptotic limit of small values of the 
hydrophobicity [h]. Assuming a linear relation between the correlation length A p and the correlation parameter 7 P 
in the parameter range where the shift appears — an assumption which should be valid if the shift is small — one 
also has AA p ~ [h] 2 . The numerical Monte Carlo data in figure [4] are consistent with this observation, although it 
should be stressed that the quality of the shown numerical data is not good enough to deduce reliable quantitative 
statements. 
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The mean field treatment has been used in this section to get an expression for the dependence of the shift of the 
optimum correlation lengths for the complementarity and the selectivity as a function of the hydrophobicity of the 
target and rival molecules, respectively. To this end, an expansion in the correlation parameter 7 p had been carried 
out, subsequently an average over the correlated target and rival molecules was performed. The coefficients of the 
series in 7 P therefore basically depend on the moments of the hydrophobicity distribution of these molecules. It has 
to be noted in this context that the power series in 7 p is only an asymptotic one as for the limit 7 P — > the Hubbard- 
Stratonovich transformation lfl7|l cannot be applied. Nevertheless, the mean field treatment gives reasonable results 
for the system with correlated target and rival molecules as the optima of the complementarity and the selectivity 
show up at non-zero values of the correlation parameter 7 P . In the case of uncorrelated target and rival molecules, 
however, this is not the case (compare figure [5]) and thus the mean field treatment in the discussed framework is not 
applicable. 



V. MODEL OF DOMINANT COOPERATIVITY 



In the previous section the constant J of the cooperative interaction term in ([I]) has been set to zero so that only the 
direct contact interactions due to the hydrophobic effect contribute. In this section the influence of these additional 
terms is taken into account. This is done by considering the case where the cooperative interactions dominate over 
the direct contact interactions. In [2§] it has been argued that the Hamiltonian can be approximated by 



H- mt (a,9;s) = -e- 



N 

E< 

i=l 



in this case with the new (global) interaction variable s taking on the two possible values ±1. 
variable s and dropping irrelevant constants one ends up with the effective Hamiltonian 



(28) 



Summing out the 



— In cosh 



(29) 



for the sequence 9 of the probe molecule interacting with a molecule whose sequence at its recognition site is specified 
by a. Incorporating the correlation terms |4]) the two stage approach to calculate the recognition ability for a system 
with particular sequences for the target and rival molecules can be carried out. The free energy difference for the 
interaction of the probe molecules with the target and the rival molecules, respectively, is then given by 



[AF] = - 




tf/(t) 



Incosh |5>«< 



(30) 



(31) 



The remaining averages in this expression of the free energy difference can again be worked out by means of Monte 
Carlo simulations. In figure [6] the complementarity of the probe ensemble together with the free energy difference 
is depicted as a function of the correlation length of the probe molecules. Again the hydrophobicity of the target 
and rival molecules is fixed, the hydrophobicity of the probe ensemble is unrestricted (i. e. ^ p = 0) and adjusts itself 
during the design step. The data reveal again a shift in the optimum correlation length for the recognition ability 
compared to the optimum value for the complementarity, although this shift is somehow less pronounced compared 
to the model with J = 0. Thus the findings for the uncooperative model are reproduced qualitatively for the model 
with additional cooperative interactions. Nevertheless a minor difference is visible. Whereas the optimum correlation 
length with respect to the complementarity of the probe molecules is clearly shifted to a larger value compared to 
the fixed correlation length of the target molecule in the case of the uncooperative model (compare figure [I]) , the 
optimum appears (within the accuracy of the numerics) at the same correlation length for the model with dominant 
cooperativity. This is due to the fact, that the cooperative interactions lead to the formation of extended patches of 
good contacts j2§] and thus to an effective reduction of the appearance of defects in the design step, which can also be 
seen from the fact that the average complementarity at the optimum correlation length is larger for the cooperative 
model (see figures Q] and [6j . Thus defects need not be compensated by slightly extending the size of the hydrophobic 
and polar patches due to correlation effects. 
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FIG. 6: The complementarity [(K)] /N and the free energy difference [AF] /N of the system with dominant cooperative 
interactions as a function of the correlation length of the probe molecules. The correlation lengths of the target and rival 
molecules are fixed to the value shown by the circle, the corresponding hydrophobicities are h T = ht = 0.5 (solid line) and 
h T = th = 0.0 (dashed line). The optimum correlation length for the recognition ability is clearly shifted to a value below the 
optimum value for the design of the probe ensemble for the interface with non-zero hydrophobicity. 



As in the case of the uncooperative model (JH) the distribution function of the complementarity parameter of the 
probe ensemble with respect to the target and rival molecules, respectively, can be investigated. The corresponding 
curves in figure [7] reveal that one ends up with the same qualitative results as in the case of the uncooperative model. 
Note that the two distributions are well separated from each other and that the distribution of the complementarity 
with the target molecules is fairly narrow for the correlation length that corresponds to a large complementarity and 
selectivity. The width of the distribution of the complementarity with the target is fairly reduced compared to the 
width of the distribution for the uncooperative model (compare figure [3]) 
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FIG. 7: Distribution of the complementarity of the probe ensemble with respect to the target molecules (solid line) and the 
rival molecules (shaded curve) for different correlation lengths on the recognition site of the probe molecules within the model 
of dominant cooperativity J28}. On the left hand side the correlation length A p = 0.25, on the right hand side A p = 0.75. The 
hydrophobicities are ht = h x = 0.4, the correlation lengths of the target and rival molecules are At = A r = 0.263 in each case. 

In principle the same numerical analysis of the recognition ability can be carried out for arbitrary values of the 
cooperative interaction constant J in (U) although in this case an expression like |30|) for the free energy can not be 
worked out and thus the numerical effort is much increased. The free energy can be computed, for example, from the 
density of states that can be evaluated by means of suitable Monte Carlo methods [39l. liol. fill. |42| . As we expect the 
qualitative physical behaviour not to change, we do not proceed with such systems in this work. 
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VI. SUMMARY AND OUTLOOK 

In previous studies we developed coarse-grained lattice models to analyse statistical properties of molecular recog- 
nition processes between rigid biomolecules such as proteins [l(J US Hfl . The general approach consists of two stages, 
where a design of probe molecules with respect to a given target molecule is carried out first. Afterwards the recogni- 
tion ability of the probe molecules in an heterogeneous environment with rival molecules is evaluated. Note that the 
design step is carried out in absence of rival molecules whereas the testing step includes rival molecules that are struc- 
turally different from the target, but compete with them for the probe molecules. In the present work we extended 
our previous models and incorporated sequence correlations into our coarse-grained Hamiltonian of the interactions 
across the interface of the two proteins. These correlations affect the distribution of hydrophobic and polar residues 
on the surfaces of the proteins. We investigated the extended models by numerical Monte Carlo simulations and by 
mean field methods. Both approaches lead to the same qualitative results. In particular we computed the correlation 
length at which the optimum of the complementarity of the design step appears. The free energy difference, that 
specifies the selectivity of the target-probe interaction in the presence of rival molecules, shows an optimum at a 
correlation length that is different from the one corresponding to the optimum of the design step. This shift opens up 
the opportunity to carry out the design slightly away from the optimum possible way without losing selectivity. This 
might be relevant in the context of harmful effects due to point mutations during evolution which our design step is 
intended to mimic. In principle it should be possible to check the appearance of two different correlation lengths for 
the recognition sites of the two proteins that form a complex from experimental structural data. However, we do not 
know of a corresponding study of this issue. 
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